[9/25] 所有前綴出現的次數

16th鐵人賽

歐噴禿沃可

團隊晶心壯士III: ALcHemist

2024-09-25 21:06:16

133 瀏覽

分享至

2024/09/25 Leetcode Daily Problem

2416. Sum of Prefix Scores of Strings
難度: 難!

題意

給定一字串陣列words；回傳一整數陣列result，result[i]為words[i]中所有的前綴(prefix)在全部字串陣列words的出現次數總和。

範例1:
Input: words = ["abc","ab","bc","b"]
Output: [5,4,3,2]
Output[0]: "a"出現2次，"ab"出現2次，"abc"出現1次，總合為5。
Output[1]: "a"出現2次，"ab"出現2次，總合為4。
Output[2]: "b"出現2次，"bc"出現1次，總合為3。
Output[3]: "b"出現2次，總合為2。

範例2:
Input: words = ["abcd"]
Output: [4]
Output[0]: "a"出現1次，"ab"出現1次，"abc"出現1次，"abcd"出現1次，總合為4。

思路1

如同昨天的題目[9/24] 兩整數陣列中最長的相同前綴長度
又是反覆查詢字串出現的次數，立刻聯想到用哈希桌做，用two pass的方式求出

建立哈希桌前綴->出現次數，遍歷所有words取出所有的前綴，更新其出現次數。
再遍歷一次所有words，取出前綴於哈希桌查詢次數並累加。

實作1

class Solution
{
public:
    vector<int> sumPrefixScores(vector<string>& words)
    {
        const int n = (int) words.size();
        // prefix -> count
        unordered_map<string, int> um;
        for (auto word : words)
            for (size_t len = 1, sz = word.size(); len <= sz; len++)
            {
                um[word.substr(0, len)]++;
            }

        vector<int> res(n, 0);
        for (int i = 0; i < n; i++)
        {
            for (size_t len = 1, sz = words[i].size(); len <= sz; len++)
            {
                res[i] += um[words[i].substr(0, len)];
            }
        }
        return res;
    }
};

結果1

Time Limit Exceeded 31 / 38 test cases passed.
input: 超級長
顯然事情沒這麼簡單，畢竟是個難題。

實作2

這時候想到G姓同事最近教學的一招string_view
於c++17加入string_view可以不用copy一份新的string，即可查看字串；性能比string快；且string有的operator都有。
於是實驗看看string_view是否能直接當unordered_map的鍵值? 結果可以!

class Solution
{
public:
    vector<int> sumPrefixScores(const vector<string>& words)
    {
        const int n = (int) words.size();
        // prefix -> count
        unordered_map<string_view, int> um;
        for (const auto& word : words)
            for (size_t len = 1, sz = word.size(); len <= sz; len++)
            {
                string_view prefix = string_view(word.data(), len);
                um[prefix]++;
            }

        vector<int> res(n, 0);
        for (int i = 0; i < n; i++)
        {
            for (size_t len = 1, sz = words[i].size(); len <= sz; len++)
            {
                string_view prefix = string_view(words[i].data(), len);
                res[i] += um[prefix];
            }
        }
        return res;
    }
};

結果2

不得不說string_view好像怪怪的，提交第一次竟然TLE，而且測資很不合理。
Time Limit Exceeded 37 / 38 test cases passed.
input: ["bfiaaaaifb","aaaaooaaaa"]

提交第二次就過了，string_view太神辣
Accepted
38/38 cases passed (2723 ms)
Your runtime beats 5.23 % of cpp submissions
Your memory usage beats 87.28 % of cpp submissions (236 MB)

但string_view在我的環境編譯起來超怪，若是用鐵人賽第一篇刷題前準備的編譯方式:

g++ -g -Wall -fsanitize=address -fsanitize=undefined -fsanitize=leak -o a.out cpp/2416.sum-of-prefix-scores-of-strings.cpp && ./a.out

跑出來會在讀取unordered_map的地方噴錯:

==15630==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffdf12efa40 at pc 0x7fc69cbbd4b5 bp 0x7ffdf12ef390 sp 0x7ffdf12eeb38
READ of size 1 at 0x7ffdf12efa40 thread T0
    #0 0x7fc69cbbd4b4 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:861
    ...
    ...
    ...
    #12 0x5568c449691c in Solution::sumPrefixScores(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) cpp/2416.sum-of-prefix-scores-of-strings.cpp:102
    #13 0x5568c4493e9c in main cpp/2416.sum-of-prefix-scores-of-strings.cpp:116
    #14 0x7fc69bfc1d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #15 0x7fc69bfc1e3f in __libc_start_main_impl ../csu/libc-start.c:392
    #16 0x5568c44938c4 in _start (/.../leetcode/a.out+0x1a8c4)


Address 0x7ffdf12efa40 is located in stack of thread T0 at offset 304 in frame
    #0 0x5568c44961c9 in Solution::sumPrefixScores(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) cpp/2416.sum-of-prefix-scores-of-strings.cpp:82

  This frame has 8 object(s):
    [32, 33) '<unknown>'
    [48, 52) '<unknown>'
    [64, 72) '__for_begin' (line 88)
    [96, 104) '__for_end' (line 88)
    [128, 144) 'prefix' (line 91)
    [160, 176) 'prefix' (line 101)
    [192, 248) 'um' (line 86)
    [288, 320) 'word' (line 88) <== Memory access at offset 304 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:861 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long)
Shadow bytes around the buggy address:
  0x10003e255ef0: 00 00 f1 f1 f1 f1 f1 f1 01 f2 00 f2 f2 f2 00 f2
  0x10003e255f00: f2 f2 00 00 f3 f3 00 00 00 00 00 00 00 00 00 00
  0x10003e255f10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10003e255f20: 00 00 f1 f1 f1 f1 f8 f2 f8 f2 f8 f2 f2 f2 f8 f2
  0x10003e255f30: f2 f2 f8 f8 f2 f2 00 00 f2 f2 00 00 00 00 00 00
=>0x10003e255f40: 00 f2 f2 f2 f2 f2 f8 f8[f8]f8 f3 f3 f3 f3 00 00
  0x10003e255f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10003e255f60: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f1 f1 01 f2
  0x10003e255f70: f8 f2 f8 f2 f8 f2 f8 f2 f8 f2 01 f2 01 f2 01 f2
  0x10003e255f80: 01 f2 00 f2 f2 f2 00 f2 f2 f2 00 f2 f2 f2 00 f2
  0x10003e255f90: f2 f2 00 00 00 f2 f2 f2 f2 f2 00 00 00 f2 f2 f2
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==15630==ABORTING

暫時先當個技術債，因為提交到leetcode不會噴錯，那就沒事了(?

分析1

若words數量為N，每個words[i]的長度為M
時間複雜度: O(N * M)，所有的前綴都要建表、查表一次。
空間複雜度: O(N * M)，哈希桌所需。

思路2

使用字首樹(Trie)，建立一個按照prefix建立的樹，並記錄數量。

實作2

class Solution
{
private:
    TrieNode trie;
public:
    vector<int> sumPrefixScores(const vector<string>& words)
    {
        const int n = (int) words.size();
        // prefix -> count
        unordered_map<string_view, int> um;
        for (const auto& word : words)
        {
            // For every word, insert prefix to trie
            struct TrieNode* curr = &trie;
            for(const auto& c : word)
            {
                if(!curr->next[c - 'a'])
                    curr->next[c - 'a'] = new TrieNode();
                curr = curr->next[c - 'a'];
                curr->count++;
            }
        }

        vector<int> res(n, 0);
        for (int i = 0; i < n; i++)
        {
            string_view word = words[i];
            // For every word, search prefix in trie
            struct TrieNode* curr = &trie;
            for(const auto& c : word)
            {
                curr = curr->next[c - 'a'];
                res[i] += curr->count;
            }
        }
        
        // TODO
        // DFS to delete Trie
        
        return res;
    }
};

結果2

Accepted
38/38 cases passed (626 ms)
Your runtime beats 37.4 % of cpp submissions
Your memory usage beats 63.84 % of cpp submissions (704.2 MB)

總結

Time Submitted	Status	Runtime	Memory	Language
09/25/2024 20:56	Accepted	626 ms	704.2 MB	cpp
09/25/2024 19:46	Accepted	2723 ms	236 MB	cpp
09/25/2024 19:44	Time Limit Exceeded	N/A	N/A	cpp
09/25/2024 19:42	Time Limit Exceeded	N/A	N/A	cpp
09/25/2024 19:28	Time Limit Exceeded	N/A	N/A	cpp